Google クライアントライブラリ for Python で、BigQuery のデータセットやテーブルなどのメタ情報を取得してみた

2020.04.16

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

こんにちは、みかみです。

BigQuery のデータセットやテーブル、UDF などの情報は、INFORMATION_SCHEMA の各ビューから取得できますが、オンデマンド料金の場合、現在のところ INFORMATION_SCHEMA に対して SELECT 文を実行するとクエリ課金が発生します。

BigQuery INFORMATION_SCHEMA の概要｜ BigQuery ドキュメント

一方、list、get などのメタデータオペレーションは無料で使用でき、課金は発生しません。

無料のオペレーション｜ BigQuery ドキュメント

Google クライアントライブラリ for Python （ google-cloud-bigquery ）を使用して、BigQuery REST API 経由ではどんなリソース情報が取得できるか確認してみました。

Python Client for Google BigQuery

やりたいこと

Google クライアントライブラリ for Python （ google-cloud-bigquery ）でデータセットなどのリソース情報を取得する方法を知りたい
クライアントライブラリで取得できるリソース情報の詳細項目を知りたい
クライアントライブラリと INFORMATION_SCHEMA で取得できる情報には差分があるのかどうか知りたい

前提

動作確認に使用したクライアントライブラリのバージョンは以下です。

(test_bq) [ec2-user@ip-10-0-43-239 get_info]$ pip list | grep google-cloud-bigquery
google-cloud-bigquery    1.24.0

データセット情報

クライアントライブラリの以下のメソッドでデータセット情報を取得できます

list_datasets：プロジェクト内のデータセット一覧
get_dataset：データセットの詳細情報

list_datasets で取得できる項目を確認

以下の Python コードで、list_datasets() の戻り値を print して確認してみます。

from google.cloud import bigquery
from pprint import pprint

client = bigquery.Client()

project_id = 'cm-da-mikami-yuki-258308'
datasets = client.list_datasets(project_id)
if datasets:
    for obj in datasets:
        print('-------->')
        pprint(vars(obj))
else:
    print("This project does not contain any datasets.")

(test_bq) [ec2-user@ip-10-0-43-239 get_info]$ python get_dataset_info.py
-------->
{'_properties': {'datasetReference': {'datasetId': 'airflow_test',
                                      'projectId': 'cm-da-mikami-yuki-258308'},
                 'id': 'cm-da-mikami-yuki-258308:airflow_test',
                 'kind': 'bigquery#dataset',
                 'location': 'US'}}
-------->
{'_properties': {'datasetReference': {'datasetId': 'billing',
                                      'projectId': 'cm-da-mikami-yuki-258308'},
                 'id': 'cm-da-mikami-yuki-258308:billing',
                 'kind': 'bigquery#dataset',
                 'location': 'asia-northeast1'}}
-------->
{'_properties': {'datasetReference': {'datasetId': 'dataset_1',
                                      'projectId': 'cm-da-mikami-yuki-258308'},
                 'id': 'cm-da-mikami-yuki-258308:dataset_1',
                 'kind': 'bigquery#dataset',
                 'labels': {'env': 'test'},
                 'location': 'asia-northeast1'}}
-------->
{'_properties': {'datasetReference': {'datasetId': 'dataset_2',
                                      'projectId': 'cm-da-mikami-yuki-258308'},
                 'id': 'cm-da-mikami-yuki-258308:dataset_2',
                 'kind': 'bigquery#dataset',
                 'location': 'asia-northeast1'}}
-------->
{'_properties': {'datasetReference': {'datasetId': 'test_dataset_option_all',
                                      'projectId': 'cm-da-mikami-yuki-258308'},
                 'friendlyName': 'オプション確認用',
                 'id': 'cm-da-mikami-yuki-258308:test_dataset_option_all',
                 'kind': 'bigquery#dataset',
                 'labels': {'option': 'all'},
                 'location': 'asia-northeast1'}}

リファレンス情報（プロジェクト ID、データセット ID ）
データセット ID
リソース種別（ bigquery#dataset ）
ロケーション
ラベル情報
friendly name

が取得できました。

google.cloud.bigquery.dataset.DatasetListItem ｜ Python Client for Google BigQuery

get_dataset で取得できる項目を確認

続いて、get_dataset() メソッドの戻り値を確認するため、Python コードを以下に書き換えて実行してみます。

from google.cloud import bigquery
from pprint import pprint

client = bigquery.Client()

project_id = 'cm-da-mikami-yuki-258308'
datasets = client.list_datasets(project_id)
if datasets:
    for obj in datasets:
        print('-------->')
#        pprint(vars(obj))
        dataset = client.get_dataset(obj.reference)
        pprint(vars(dataset))
else:
    print("This project does not contain any datasets.")

(test_bq) [ec2-user@ip-10-0-43-239 get_info]$ python get_dataset_info.py
-------->
{'_properties': {'access': [{'role': 'WRITER',
                             'specialGroup': 'projectWriters'},
                            {'role': 'OWNER', 'specialGroup': 'projectOwners'},
                            {'role': 'OWNER',
                             'userByEmail': 'xxxx@classmethod.jp'},
                            {'role': 'READER',
                             'specialGroup': 'projectReaders'}],
                 'creationTime': '1584102236040',
                 'datasetReference': {'datasetId': 'airflow_test',
                                      'projectId': 'cm-da-mikami-yuki-258308'},
                 'etag': 'edTJxxxxcQ==',
                 'id': 'cm-da-mikami-yuki-258308:airflow_test',
                 'kind': 'bigquery#dataset',
                 'lastModifiedTime': '1584102236040',
                 'location': 'US',
                 'selfLink': 'https://bigquery.googleapis.com/bigquery/v2/projects/cm-da-mikami-yuki-258308/datasets/airflow_test'}}
-------->
{'_properties': {'access': [{'role': 'WRITER',
                             'specialGroup': 'projectWriters'},
                            {'role': 'OWNER', 'specialGroup': 'projectOwners'},
                            {'role': 'OWNER',
                             'userByEmail': 'billing-export-bigquery@system.gserviceaccount.com'},
                            {'role': 'OWNER',
                             'userByEmail': 'xxxx@classmethod.jp'},
                            {'role': 'READER',
                             'specialGroup': 'projectReaders'}],
                 'creationTime': '1585283075708',
                 'datasetReference': {'datasetId': 'billing',
                                      'projectId': 'cm-da-mikami-yuki-258308'},
                 'etag': 'xNPkxxxxeQ==',
                 'id': 'cm-da-mikami-yuki-258308:billing',
                 'kind': 'bigquery#dataset',
                 'lastModifiedTime': '1585283726009',
                 'location': 'asia-northeast1',
                 'selfLink': 'https://bigquery.googleapis.com/bigquery/v2/projects/cm-da-mikami-yuki-258308/datasets/billing'}}
-------->
{'_properties': {'access': [{'role': 'WRITER',
                             'specialGroup': 'projectWriters'},
                            {'role': 'WRITER',
                             'userByEmail': 'service-797147019523@gcp-sa-bigquerydatatransfer.iam.gserviceaccount.com'},
                            {'role': 'OWNER', 'specialGroup': 'projectOwners'},
                            {'role': 'OWNER',
                             'userByEmail': 'xxxx@classmethod.jp'},
                            {'role': 'READER',
                             'specialGroup': 'projectReaders'},
                            {'role': 'READER',
                             'userByEmail': 'test-mikami-account-c@cm-da-mikami-yuki-258308.iam.gserviceaccount.com'}],
                 'creationTime': '1585896779508',
                 'datasetReference': {'datasetId': 'dataset_1',
                                      'projectId': 'cm-da-mikami-yuki-258308'},
                 'etag': 'xDnMxxxxYQ==',
                 'id': 'cm-da-mikami-yuki-258308:dataset_1',
                 'kind': 'bigquery#dataset',
                 'labels': {'env': 'test'},
                 'lastModifiedTime': '1586516882711',
                 'location': 'asia-northeast1',
                 'selfLink': 'https://bigquery.googleapis.com/bigquery/v2/projects/cm-da-mikami-yuki-258308/datasets/dataset_1'}}
-------->
{'_properties': {'access': [{'role': 'WRITER',
                             'specialGroup': 'projectWriters'},
                            {'role': 'OWNER', 'specialGroup': 'projectOwners'},
                            {'role': 'OWNER',
                             'userByEmail': 'xxxx@classmethod.jp'},
                            {'role': 'READER',
                             'specialGroup': 'projectReaders'}],
                 'creationTime': '1585896793308',
                 'datasetReference': {'datasetId': 'dataset_2',
                                      'projectId': 'cm-da-mikami-yuki-258308'},
                 'description': '説明付きのデータセット',
                 'etag': 'bVufxxxxqg==',
                 'id': 'cm-da-mikami-yuki-258308:dataset_2',
                 'kind': 'bigquery#dataset',
                 'lastModifiedTime': '1586405357740',
                 'location': 'asia-northeast1',
                 'selfLink': 'https://bigquery.googleapis.com/bigquery/v2/projects/cm-da-mikami-yuki-258308/datasets/dataset_2'}}
-------->
{'_properties': {'access': [{'role': 'WRITER',
                             'specialGroup': 'projectWriters'},
                            {'role': 'OWNER', 'specialGroup': 'projectOwners'},
                            {'role': 'OWNER',
                             'userByEmail': 'test-mikami@cm-da-mikami-yuki-258308.iam.gserviceaccount.com'},
                            {'role': 'READER',
                             'specialGroup': 'projectReaders'}],
                 'creationTime': '1586419321609',
                 'datasetReference': {'datasetId': 'test_dataset_option_all',
                                      'projectId': 'cm-da-mikami-yuki-258308'},
                 'defaultTableExpirationMs': '7776000000',
                 'description': 'オプション情報取得確認用のデータセット',
                 'etag': 'dYBlxxxx3w==',
                 'friendlyName': 'オプション確認用',
                 'id': 'cm-da-mikami-yuki-258308:test_dataset_option_all',
                 'kind': 'bigquery#dataset',
                 'labels': {'option': 'all'},
                 'lastModifiedTime': '1586419430241',
                 'location': 'asia-northeast1',
                 'selfLink': 'https://bigquery.googleapis.com/bigquery/v2/projects/cm-da-mikami-yuki-258308/datasets/test_dataset_option_all'}}

※一部、伏字に変更しています。

list_datasets で取得できる項目に加えて、以下の情報も取得できました。

アクセス制御情報
作成日時（Unixtime）
デフォルトのテーブルの有効期限
description
ETag 値
最終更新日時（Unixtime）
API リクエスト URL

google.cloud.bigquery.dataset.Dataset ｜ Python Client for Google BigQuery

テーブル / ビュー情報

テーブル / ビュー情報の取得には、以下のメソッドを使用します。

list_tables：データセット内のテーブル / ビュー一覧
get_table：テーブル / ビューの詳細情報

list_tables で取得できる項目を確認

list_tables で特定のデータセット内のテーブルの情報を取得してみます。

from google.cloud import bigquery
from pprint import pprint

client = bigquery.Client()

dataset_id = 'cm-da-mikami-yuki-258308.dataset_1'
dataset = client.get_dataset(dataset_id)
tables = client.list_tables(dataset)
if tables:
    for obj in tables:
        print('-------->')
        pprint(vars(obj))
else:
    print("\tThis dataset does not contain any tables.")

(test_bq) [ec2-user@ip-10-0-43-239 get_info]$ python get_table_info.py
-------->
{'_properties': {'creationTime': '1586855793633',
                 'id': 'cm-da-mikami-yuki-258308:dataset_1.pos_partition_date',
                 'kind': 'bigquery#table',
                 'tableReference': {'datasetId': 'dataset_1',
                                    'projectId': 'cm-da-mikami-yuki-258308',
                                    'tableId': 'pos_partition_date'},
                 'timePartitioning': {'field': 'first_date', 'type': 'DAY'},
                 'type': 'TABLE'}}
-------->
{'_properties': {'creationTime': '1586853331458',
                 'id': 'cm-da-mikami-yuki-258308:dataset_1.pos_partition_loadtime',
                 'kind': 'bigquery#table',
                 'tableReference': {'datasetId': 'dataset_1',
                                    'projectId': 'cm-da-mikami-yuki-258308',
                                    'tableId': 'pos_partition_loadtime'},
                 'timePartitioning': {'type': 'DAY'},
                 'type': 'TABLE'}}
-------->
{'_properties': {'creationTime': '1586857582005',
                 'id': 'cm-da-mikami-yuki-258308:dataset_1.pos_partition_loadtime_with_filter',
                 'kind': 'bigquery#table',
                 'tableReference': {'datasetId': 'dataset_1',
                                    'projectId': 'cm-da-mikami-yuki-258308',
                                    'tableId': 'pos_partition_loadtime_with_filter'},
                 'timePartitioning': {'requirePartitionFilter': True,
                                      'type': 'DAY'},
                 'type': 'TABLE'}}
-------->
{'_properties': {'creationTime': '1586856044131',
                 'id': 'cm-da-mikami-yuki-258308:dataset_1.pos_partition_val',
                 'kind': 'bigquery#table',
                 'rangePartitioning': {'field': 'JANCD',
                                       'range': {'end': '5000000000000',
                                                 'interval': '100000000',
                                                 'start': '4000000000000'}},
                 'tableReference': {'datasetId': 'dataset_1',
                                    'projectId': 'cm-da-mikami-yuki-258308',
                                    'tableId': 'pos_partition_val'},
                 'type': 'TABLE'}}
-------->
{'_properties': {'creationTime': '1585897296621',
                 'id': 'cm-da-mikami-yuki-258308:dataset_1.table_dogs',
                 'kind': 'bigquery#table',
                 'labels': {'type': 'dog'},
                 'tableReference': {'datasetId': 'dataset_1',
                                    'projectId': 'cm-da-mikami-yuki-258308',
                                    'tableId': 'table_dogs'},
                 'type': 'TABLE'}}
-------->
{'_properties': {'creationTime': '1586430585167',
                 'id': 'cm-da-mikami-yuki-258308:dataset_1.table_dogs_nested',
                 'kind': 'bigquery#table',
                 'tableReference': {'datasetId': 'dataset_1',
                                    'projectId': 'cm-da-mikami-yuki-258308',
                                    'tableId': 'table_dogs_nested'},
                 'type': 'TABLE'}}
-------->
{'_properties': {'creationTime': '1587015388951',
                 'expirationTime': '1590935802000',
                 'id': 'cm-da-mikami-yuki-258308:dataset_1.table_expiring',
                 'kind': 'bigquery#table',
                 'tableReference': {'datasetId': 'dataset_1',
                                    'projectId': 'cm-da-mikami-yuki-258308',
                                    'tableId': 'table_expiring'},
                 'type': 'TABLE'}}
-------->
{'_properties': {'creationTime': '1586778384776',
                 'id': 'cm-da-mikami-yuki-258308:dataset_1.table_external',
                 'kind': 'bigquery#table',
                 'tableReference': {'datasetId': 'dataset_1',
                                    'projectId': 'cm-da-mikami-yuki-258308',
                                    'tableId': 'table_external'},
                 'type': 'EXTERNAL'}}
-------->
{'_properties': {'creationTime': '1586424010456',
                 'friendlyName': '分かりやすい名前付きのテーブル',
                 'id': 'cm-da-mikami-yuki-258308:dataset_1.table_friendly_name',
                 'kind': 'bigquery#table',
                 'tableReference': {'datasetId': 'dataset_1',
                                    'projectId': 'cm-da-mikami-yuki-258308',
                                    'tableId': 'table_friendly_name'},
                 'type': 'TABLE'}}
-------->
{'_properties': {'creationTime': '1586422712110',
                 'id': 'cm-da-mikami-yuki-258308:dataset_1.table_kms',
                 'kind': 'bigquery#table',
                 'tableReference': {'datasetId': 'dataset_1',
                                    'projectId': 'cm-da-mikami-yuki-258308',
                                    'tableId': 'table_kms'},
                 'type': 'TABLE'}}
-------->
{'_properties': {'creationTime': '1586425128064',
                 'id': 'cm-da-mikami-yuki-258308:dataset_1.table_partition_expiring',
                 'kind': 'bigquery#table',
                 'tableReference': {'datasetId': 'dataset_1',
                                    'projectId': 'cm-da-mikami-yuki-258308',
                                    'tableId': 'table_partition_expiring'},
                 'timePartitioning': {'expirationMs': '36000000',
                                      'type': 'DAY'},
                 'type': 'TABLE'}}
-------->
{'_properties': {'creationTime': '1587015640942',
                 'id': 'cm-da-mikami-yuki-258308:dataset_1.view_dogs',
                 'kind': 'bigquery#table',
                 'tableReference': {'datasetId': 'dataset_1',
                                    'projectId': 'cm-da-mikami-yuki-258308',
                                    'tableId': 'view_dogs'},
                 'type': 'VIEW',
                 'view': {'useLegacySql': False}}}
-------->
{'_properties': {'creationTime': '1586518663164',
                 'friendlyName': 'フレンドリーネーム',
                 'id': 'cm-da-mikami-yuki-258308:dataset_1.view_dogs_2',
                 'kind': 'bigquery#table',
                 'labels': {'type': 'dog'},
                 'tableReference': {'datasetId': 'dataset_1',
                                    'projectId': 'cm-da-mikami-yuki-258308',
                                    'tableId': 'view_dogs_2'},
                 'type': 'VIEW',
                 'view': {'useLegacySql': False}}}

以下の項目が取得できました。

作成日時（Unixtime）
テーブル ID
リソース種別（ bigquery#table ）
ラベル情報
リファレンス情報（プロジェクト ID、データセット ID、テーブル ID ）
テーブル種別（ TABLE or VIEW or EXTERNAL ）
有効期限（Unixtime）
friendly name
分割タイプ、パーティションフィルタ要否
パーティション範囲情報
パーティションの有効期限（Unixtime）

google.cloud.bigquery.table.TableListItem ｜ Python Client for Google BigQuery

get_table で取得できる項目を確認

データセット情報同様、get_table に変更して実行してみます。

from google.cloud import bigquery
from pprint import pprint

client = bigquery.Client()

dataset_id = 'cm-da-mikami-yuki-258308.dataset_1'
dataset = client.get_dataset(dataset_id)
tables = client.list_tables(dataset)
if tables:
    for obj in tables:
        print('-------->')
#        pprint(vars(obj))
        table = client.get_table(obj.reference)
        pprint(vars(table))
else:
    print("\tThis dataset does not contain any tables.")

(test_bq) [ec2-user@ip-10-0-43-239 get_info]$ python get_table_info.py
-------->
{'_properties': {'creationTime': '1586855793633',
                 'etag': 'yziexxxxcA==',
                 'id': 'cm-da-mikami-yuki-258308:dataset_1.pos_partition_date',
                 'kind': 'bigquery#table',
                 'lastModifiedTime': '1586855793633',
                 'location': 'asia-northeast1',
                 'numBytes': '14192',
                 'numLongTermBytes': '0',
                 'numRows': '100',
                 'schema': {'fields': [{'mode': 'REQUIRED',
                                        'name': 'id',
                                        'type': 'INTEGER'},
                                       {'mode': 'REQUIRED',
                                        'name': 'JANCD',
                                        'type': 'INTEGER'},
                                       {'mode': 'NULLABLE',
                                        'name': 'maker',
                                        'type': 'STRING'},
                                       {'mode': 'NULLABLE',
                                        'name': 'name',
                                        'type': 'STRING'},
                                       {'mode': 'NULLABLE',
                                        'name': 'first_date',
                                        'type': 'DATE'},
                                       {'mode': 'NULLABLE',
                                        'name': 'PI_amount',
                                        'type': 'INTEGER'},
                                       {'mode': 'NULLABLE',
                                        'name': 'PI_week',
                                        'type': 'STRING'},
                                       {'mode': 'NULLABLE',
                                        'name': 'store_percentage',
                                        'type': 'STRING'},
                                       {'mode': 'NULLABLE',
                                        'name': 'price_avg',
                                        'type': 'INTEGER'}]},
                 'selfLink': 'https://bigquery.googleapis.com/bigquery/v2/projects/cm-da-mikami-yuki-258308/datasets/dataset_1/tables/pos_partition_date',
                 'tableReference': {'datasetId': 'dataset_1',
                                    'projectId': 'cm-da-mikami-yuki-258308',
                                    'tableId': 'pos_partition_date'},
                 'timePartitioning': {'field': 'first_date', 'type': 'DAY'},
                 'type': 'TABLE'}}
-------->
{'_properties': {'creationTime': '1586853331458',
                 'etag': 'f1i3xxxxUQ==',
                 'id': 'cm-da-mikami-yuki-258308:dataset_1.pos_partition_loadtime',
（省略）
                 'timePartitioning': {'type': 'DAY'},
                 'type': 'TABLE'}}
-------->
{'_properties': {'creationTime': '1586857582005',
                 'etag': 'd8edxxxxvA==',
                 'id': 'cm-da-mikami-yuki-258308:dataset_1.pos_partition_loadtime_with_filter',
（省略）
                 'requirePartitionFilter': True,
（省略）
                 'timePartitioning': {'requirePartitionFilter': True,
                                      'type': 'DAY'},
                 'type': 'TABLE'}}
-------->
{'_properties': {'creationTime': '1586856044131',
                 'etag': '2hWLxxxxBg==',
                 'id': 'cm-da-mikami-yuki-258308:dataset_1.pos_partition_val',
（省略）
                 'rangePartitioning': {'field': 'JANCD',
                                       'range': {'end': '5000000000000',
                                                 'interval': '100000000',
                                                 'start': '4000000000000'}},
（省略）
                 'type': 'TABLE'}}
-------->
{'_properties': {'creationTime': '1585897296621',
                 'description': 'テスト用',
                 'etag': 'BXWKxxxx3g==',
                 'id': 'cm-da-mikami-yuki-258308:dataset_1.table_dogs',
                 'kind': 'bigquery#table',
                 'labels': {'type': 'dog'},
（省略）
                 'type': 'TABLE'}}
-------->
{'_properties': {'creationTime': '1586430585167',
                 'etag': '+ipdxxxxdw==',
                 'id': 'cm-da-mikami-yuki-258308:dataset_1.table_dogs_nested',
                 'kind': 'bigquery#table',
                 'lastModifiedTime': '1586430657420',
                 'location': 'asia-northeast1',
                 'numBytes': '0',
                 'numLongTermBytes': '0',
                 'numRows': '0',
                 'schema': {'fields': [{'description': 'Object',
                                        'fields': [{'description': '名前',
                                                    'mode': 'REQUIRED',
                                                    'name': 'name',
                                                    'type': 'STRING'},
                                                   {'description': '性別',
                                                    'name': 'gender',
                                                    'type': 'STRING'},
                                                   {'description': '年齢',
                                                    'name': 'age',
                                                    'type': 'STRING'}],
                                        'mode': 'REPEATED',
                                        'name': 'dog',
                                        'type': 'RECORD'}]},
                 'selfLink': 'https://bigquery.googleapis.com/bigquery/v2/projects/cm-da-mikami-yuki-258308/datasets/dataset_1/tables/table_dogs_nested',
                 'tableReference': {'datasetId': 'dataset_1',
                                    'projectId': 'cm-da-mikami-yuki-258308',
                                    'tableId': 'table_dogs_nested'},
                 'type': 'TABLE'}}
-------->
{'_properties': {'creationTime': '1587015388951',
                 'etag': 'aXNQxxxxow==',
                 'expirationTime': '1590935802000',
                 'id': 'cm-da-mikami-yuki-258308:dataset_1.table_expiring',
（省略）
                 'type': 'TABLE'}}
-------->
{'_properties': {'creationTime': '1586778384776',
                 'etag': 'ad2Zxxxx+Q==',
                 'externalDataConfiguration': {'sourceFormat': 'CSV',
                                               'sourceUris': ['gs://test-mikami/yob1980.txt']},
                 'id': 'cm-da-mikami-yuki-258308:dataset_1.table_external',
（省略）
                 'type': 'EXTERNAL'}}
-------->
{'_properties': {'creationTime': '1586424010456',
                 'etag': 'XsHAxxxxag==',
                 'friendlyName': '分かりやすい名前付きのテーブル',
                 'id': 'cm-da-mikami-yuki-258308:dataset_1.table_friendly_name',
（省略）
                 'type': 'TABLE'}}
-------->
{'_properties': {'creationTime': '1586422712110',
                 'encryptionConfiguration': {'kmsKeyName': 'projects/cm-da-mikami-yuki-258308/locations/asia-northeast1/keyRings/xxxx/cryptoKeys/xxxxi'},
                 'etag': 'uRqMxxxxfg==',
                 'id': 'cm-da-mikami-yuki-258308:dataset_1.table_kms',
（省略）
                 'type': 'TABLE'}}
-------->
{'_properties': {'creationTime': '1586425128064',
                 'etag': 'tAw/xxxx5Q==',
                 'id': 'cm-da-mikami-yuki-258308:dataset_1.table_partition_expiring',
（省略）
                 'timePartitioning': {'expirationMs': '36000000',
                                      'type': 'DAY'},
                 'type': 'TABLE'}}
-------->
{'_properties': {'creationTime': '1587015640942',
                 'etag': 'mWbdxxxxXg==',
                 'id': 'cm-da-mikami-yuki-258308:dataset_1.view_dogs',
（省略）
                 'type': 'VIEW',
                 'view': {'query': 'SELECT id, name FROM '
                                   'cm-da-mikami-yuki-258308.dataset_1.table_dogs',
                          'useLegacySql': False}}}
-------->
{'_properties': {'creationTime': '1586518663164',
                 'description': 'ビューからビュー',
                 'etag': '0Qokxxxx9g==',
                 'friendlyName': 'フレンドリーネーム',
                 'id': 'cm-da-mikami-yuki-258308:dataset_1.view_dogs_2',
                 'kind': 'bigquery#table',
                 'labels': {'type': 'dog'},
                 'lastModifiedTime': '1586518663164',
                 'location': 'asia-northeast1',
                 'numBytes': '0',
                 'numLongTermBytes': '0',
                 'numRows': '0',
                 'schema': {'fields': [{'name': 'name', 'type': 'STRING'}]},
                 'selfLink': 'https://bigquery.googleapis.com/bigquery/v2/projects/cm-da-mikami-yuki-258308/datasets/dataset_1/tables/view_dogs_2',
                 'tableReference': {'datasetId': 'dataset_1',
                                    'projectId': 'cm-da-mikami-yuki-258308',
                                    'tableId': 'view_dogs_2'},
                 'type': 'VIEW',
                 'view': {'query': 'SELECT name FROM '
                                   'cm-da-mikami-yuki-258308.dataset_1.view_dogs',
                          'useLegacySql': False}}}

※一部省略＆伏字に変更しています。

list_tables で取得できる項目に加えて、以下の情報が取得できました。

テーブル ID
データサイズ（ byte ）
行数
テーブルスキーマ情報
ロケーション
description
ロケーション
ETag 値
最終更新日時（Unixtime）
API のリクエスト URL
外部データソース
KMS キー
View 定義、View がレガシー SQL で作成されたかどうか

google.cloud.bigquery.table.Table ｜ Python Client for Google BigQuery

ルーティン情報

UDF やプロシージャの情報は、以下のメソッドで取得できます。

list_routines：データセット内のルーティン一覧
get_routine：ルーティンの詳細情報

list_routines で取得できる項目を確認

list_routines で UDF、プロシージャの一覧を取得してみます。

from google.cloud import bigquery
from pprint import pprint

client = bigquery.Client()

dataset_id = 'cm-da-mikami-yuki-258308.dataset_1'
dataset = client.get_dataset(dataset_id)
routines = client.list_routines(dataset)
if routines:
    for obj in routines:
        print('-------->')
        pprint(vars(obj))
else:
    print("\tThis dataset does not contain any routines.")

(test_bq) [ec2-user@ip-10-0-43-239 get_info]$ python get_routine_info.py
-------->
{'_properties': {'creationTime': '1586493860018',
                 'etag': 'VEQYxxxxIA==',
                 'language': 'SQL',
                 'lastModifiedTime': '1586493860018',
                 'routineReference': {'datasetId': 'dataset_1',
                                      'projectId': 'cm-da-mikami-yuki-258308',
                                      'routineId': 'addFourAndDivideAny'},
                 'routineType': 'SCALAR_FUNCTION'}}
-------->
{'_properties': {'creationTime': '1586513017763',
                 'etag': 'cx5rxxxxjQ==',
                 'language': 'SQL',
                 'lastModifiedTime': '1586513017763',
                 'routineReference': {'datasetId': 'dataset_1',
                                      'projectId': 'cm-da-mikami-yuki-258308',
                                      'routineId': 'test_function'},
                 'routineType': 'SCALAR_FUNCTION'}}
-------->
{'_properties': {'creationTime': '1586506722471',
                 'etag': '6b9wxxxx2w==',
                 'language': 'SQL',
                 'lastModifiedTime': '1586506722471',
                 'routineReference': {'datasetId': 'dataset_1',
                                      'projectId': 'cm-da-mikami-yuki-258308',
                                      'routineId': 'test_procedure'},
                 'routineType': 'PROCEDURE'}}
-------->
{'_properties': {'creationTime': '1586513263155',
                 'etag': 'bR4RxxxxBw==',
                 'language': 'SQL',
                 'lastModifiedTime': '1586513263155',
                 'routineReference': {'datasetId': 'dataset_1',
                                      'projectId': 'cm-da-mikami-yuki-258308',
                                      'routineId': 'test_procedure_description'},
                 'routineType': 'PROCEDURE'}}
-------->
{'_properties': {'creationTime': '1586508765922',
                 'etag': 'Lrvjxxxx6A==',
                 'language': 'JAVASCRIPT',
                 'lastModifiedTime': '1586508765922',
                 'routineReference': {'datasetId': 'dataset_1',
                                      'projectId': 'cm-da-mikami-yuki-258308',
                                      'routineId': 'timesTwo'},
                 'routineType': 'SCALAR_FUNCTION'}}

※一部、伏字に変更しています。

以下の情報が取得できました。

作成日時（Unixtime）
ETag 値
UDF 定義言語
最終更新日時（Unixtime）
リファレンス情報（プロジェクト ID、データセット ID、ルーティン ID ）
ルーティン種別（ SCALAR_FUNCTION or PROCEDURE ）

google.cloud.bigquery.routine.Routine ｜ Python Client for Google BigQuery

get_routine で取得できる項目を確認

より詳細なルーティン情報は、get_routine で取得します。

from google.cloud import bigquery
from pprint import pprint

client = bigquery.Client()

dataset_id = 'cm-da-mikami-yuki-258308.dataset_1'
dataset = client.get_dataset(dataset_id)
routines = client.list_routines(dataset)
if routines:
    for obj in routines:
        print('-------->')
#        pprint(vars(obj))
        routine = client.get_routine(obj.reference)
        pprint(vars(routine))
else:
    print("\tThis dataset does not contain any routines.")

(test_bq) [ec2-user@ip-10-0-43-239 get_info]$ python get_routine_info.py
-------->
{'_properties': {'arguments': [{'argumentKind': 'ANY_TYPE', 'name': 'x'},
                               {'argumentKind': 'ANY_TYPE', 'name': 'y'}],
                 'creationTime': '1586493860018',
                 'definitionBody': '(x + 4) / y',
                 'etag': 'VEQYxxxxIA==',
                 'language': 'SQL',
                 'lastModifiedTime': '1586493860018',
                 'routineReference': {'datasetId': 'dataset_1',
                                      'projectId': 'cm-da-mikami-yuki-258308',
                                      'routineId': 'addFourAndDivideAny'},
                 'routineType': 'SCALAR_FUNCTION'}}
-------->
{'_properties': {'creationTime': '1586513017763',
                 'definitionBody': '1*2',
                 'etag': 'cx5rxxxxjQ==',
                 'importedLibraries': ['gs://cloud-samples-data/bigquery/udfs/max-value.js'],
                 'language': 'SQL',
                 'lastModifiedTime': '1586513017763',
                 'routineReference': {'datasetId': 'dataset_1',
                                      'projectId': 'cm-da-mikami-yuki-258308',
                                      'routineId': 'test_function'},
                 'routineType': 'SCALAR_FUNCTION'}}
-------->
{'_properties': {'arguments': [{'dataType': {'typeKind': 'INT64'},
                                'mode': 'IN',
                                'name': 'id'},
                               {'dataType': {'typeKind': 'STRING'},
                                'mode': 'IN',
                                'name': 'value'}],
                 'creationTime': '1586506722471',
                 'definitionBody': 'BEGIN\n'
                                   '  INSERT INTO '
                                   'dataset_1.table_with_description\n'
                                   '  VALUES (id, value);\n'
                                   'END',
                 'etag': '6b9wxxxx2w==',
                 'language': 'SQL',
                 'lastModifiedTime': '1586506722471',
                 'routineReference': {'datasetId': 'dataset_1',
                                      'projectId': 'cm-da-mikami-yuki-258308',
                                      'routineId': 'test_procedure'},
                 'routineType': 'PROCEDURE'}}
-------->
{'_properties': {'creationTime': '1586513263155',
                 'definitionBody': 'SELECT 1;',
                 'description': 'プロシージャの説明',
                 'etag': 'bR4RxxxxBw==',
                 'language': 'SQL',
                 'lastModifiedTime': '1586513263155',
                 'routineReference': {'datasetId': 'dataset_1',
                                      'projectId': 'cm-da-mikami-yuki-258308',
                                      'routineId': 'test_procedure_description'},
                 'routineType': 'PROCEDURE'}}
-------->
{'_properties': {'arguments': [{'dataType': {'typeKind': 'FLOAT64'},
                                'name': 'x'}],
                 'creationTime': '1586508765922',
                 'definitionBody': '\nlib = "sample.js"\nreturn x*2;\n',
                 'etag': 'Lrvjxxxx6A==',
                 'language': 'JAVASCRIPT',
                 'lastModifiedTime': '1586508765922',
                 'returnType': {'typeKind': 'FLOAT64'},
                 'routineReference': {'datasetId': 'dataset_1',
                                      'projectId': 'cm-da-mikami-yuki-258308',
                                      'routineId': 'timesTwo'},
                 'routineType': 'SCALAR_FUNCTION'}}

※一部、伏字に変更しています。

list_routines でも get_routine でも、戻り値としては同じ google.cloud.bigquery.routine.Routine 型のオブジェクトが返却されますが、list_routines では NULL 値となっていた description などの以下のプロパティが追加で取得できました。

description
パラメータ情報
戻り値のデータ型
ルーティン定義本文

google.cloud.bigquery.routine.Routine ｜ Python Client for Google BigQuery

アクセス制限されているリソース情報の取得

アクセスが許可されていないリソースの情報は取得できません。

サービスアカウントを切り替えてリソース情報を取得する、以下の Python を準備しました。

from google.cloud import bigquery
from pprint import pprint
from google.oauth2 import service_account
import argparse
import os.path

parser = argparse.ArgumentParser(description='select data')
parser.add_argument('file', help='account key file')
args = parser.parse_args()

key_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), args.file)
credentials = service_account.Credentials.from_service_account_file(
    key_path,
    scopes=["https://www.googleapis.com/auth/cloud-platform"],
)
client = bigquery.Client(
    credentials=credentials,
    project=credentials.project_id,
)
#client = bigquery.Client()

project_id = 'cm-da-mikami-yuki-258308'
datasets = client.list_datasets(project_id)
if datasets:
    for obj in datasets:
        print('-------->')
        pprint(vars(obj))
else:
    print("This project does not contain any datasets.")

全てのデータセットにアクセス可能なサービスアカウントを指定して実行すると、以下の通り、５つ全てのデータセット情報が取得できます。

(test_bq) [ec2-user@ip-10-0-43-239 get_info]$ python get_dataset_by_account.py key_account_a.json
-------->
{'_properties': {'datasetReference': {'datasetId': 'airflow_test',
                                      'projectId': 'cm-da-mikami-yuki-258308'},
                 'id': 'cm-da-mikami-yuki-258308:airflow_test',
                 'kind': 'bigquery#dataset',
                 'location': 'US'}}
-------->
{'_properties': {'datasetReference': {'datasetId': 'billing',
                                      'projectId': 'cm-da-mikami-yuki-258308'},
                 'id': 'cm-da-mikami-yuki-258308:billing',
                 'kind': 'bigquery#dataset',
                 'location': 'asia-northeast1'}}
-------->
{'_properties': {'datasetReference': {'datasetId': 'dataset_1',
                                      'projectId': 'cm-da-mikami-yuki-258308'},
                 'id': 'cm-da-mikami-yuki-258308:dataset_1',
                 'kind': 'bigquery#dataset',
                 'labels': {'env': 'test'},
                 'location': 'asia-northeast1'}}
-------->
{'_properties': {'datasetReference': {'datasetId': 'dataset_2',
                                      'projectId': 'cm-da-mikami-yuki-258308'},
                 'id': 'cm-da-mikami-yuki-258308:dataset_2',
                 'kind': 'bigquery#dataset',
                 'location': 'asia-northeast1'}}
-------->
{'_properties': {'datasetReference': {'datasetId': 'test_dataset_option_all',
                                      'projectId': 'cm-da-mikami-yuki-258308'},
                 'friendlyName': 'オプション確認用',
                 'id': 'cm-da-mikami-yuki-258308:test_dataset_option_all',
                 'kind': 'bigquery#dataset',
                 'labels': {'option': 'all'},
                 'location': 'asia-northeast1'}}

dataset_1 へのアクセスしか許可されていないサービスアカウントに切り替えて実行してみると、該当アカウントでアクセス可能なデータセットの情報しか取得できませんでした。

(test_bq) [ec2-user@ip-10-0-43-239 get_info]$ python get_dataset_by_account.py key_account_c.json
-------->
{'_properties': {'datasetReference': {'datasetId': 'dataset_1',
                                      'projectId': 'cm-da-mikami-yuki-258308'},
                 'id': 'cm-da-mikami-yuki-258308:dataset_1',
                 'kind': 'bigquery#dataset',
                 'labels': {'env': 'test'},
                 'location': 'asia-northeast1'}}

同様に、アクセス許可されていないデータセット内のテーブル情報を取得しようとしてみると

(test_bq) [ec2-user@ip-10-0-43-239 get_info]$ python get_table_by_account.py key_account_c.json
Traceback (most recent call last):
  File "get_table_by_account.py", line 23, in <module>
    dataset = client.get_dataset(dataset_id)
  File "/home/ec2-user/test_bq/lib/python3.7/site-packages/google/cloud/bigquery/client.py", line 586, in get_dataset
    retry, method="GET", path=dataset_ref.path, timeout=timeout
  File "/home/ec2-user/test_bq/lib/python3.7/site-packages/google/cloud/bigquery/client.py", line 556, in _call_api
    return call()
  File "/home/ec2-user/test_bq/lib/python3.7/site-packages/google/api_core/retry.py", line 286, in retry_wrapped_func
    on_error=on_error,
  File "/home/ec2-user/test_bq/lib/python3.7/site-packages/google/api_core/retry.py", line 184, in retry_target
    return target()
  File "/home/ec2-user/test_bq/lib/python3.7/site-packages/google/cloud/_http.py", line 423, in api_request
    raise exceptions.from_http_response(response)
google.api_core.exceptions.Forbidden: 403 GET https://bigquery.googleapis.com/bigquery/v2/projects/cm-da-mikami-yuki-258308/datasets/dataset_2: Access Denied: Dataset cm-da-mikami-yuki-258308:dataset_2: User does not have bigquery.datasets.get permission for dataset cm-da-mikami-yuki-258308:dataset_2.

パーミッションエラーとなり、テーブル情報も参照できませんでした。

INFORMATION_SCHEMA から取得できる情報と比較

以下、INFORMATION_SCHEMA から取得できる情報と、クライアントライブラリから取得できる情報の一覧です。

リソース	項目	INFORMATION_SCHEMA	Python Client Library
データセット	プロジェクト ID	〇	〇
	データセット ID （データセット名）	〇	〇
	作成日時	〇	〇
	最終更新日時	〇	〇
	ロケーション	〇	〇
	デフォルトのテーブルの有効期限	〇	〇
	description	〇	〇
	friendly name	〇	〇
	ラベル情報	〇	〇
	リソース種別（ bigquery#dataset ）	×	〇
	アクセス制御情報	×	〇
	API リクエスト URL	×	〇
	ETag 値	×	〇
テーブル / ビュー / カラム	プロジェクト ID	〇	〇
	データセット ID （データセット名）	〇	〇
	テーブル ID （テーブル名）	〇	〇
	テーブル種別	〇	〇
	INSERT 文が実行可能かどうか	〇	×
	作成日時	〇	〇
	有効期限	〇	〇
	最終更新日時	×	〇
	パーティションの有効期限	〇	〇
	description	〇	〇
	friendly name	〇	〇
	ラベル情報	〇	〇
	KMS キー	〇	〇
	外部データソース	×	〇
	パーティションフィルタ要否	〇	〇
	パーティション範囲情報	×	〇
	分割タイプ	×	〇
	データサイズ（ byte ）	×（※1）	〇
	行数	×（※1）	〇
	ロケーション	×	〇
	リソース種別（ bigquery#table ）	×	〇
	アクセス制御情報	×	〇
	API リクエスト URL	×	〇
	ETag 値	×	〇
	ビュー定義	〇	〇
	ビューがレガシー SQL で作成されたかどうか	〇	〇
	カラム名	〇	〇（※2）
	カラム順	〇	×
	データ型	〇	〇（※2）
	NULLABLE かどうか	〇	〇（※2）
	疑似列かどうか	〇	×
	パーティショニングカラムかどうか	〇	×
	クラスタリングカラム順	〇	×
	カラムの description	〇	〇（※2）
	カラムパス	〇	〇（※2）
ルーティン	プロジェクト ID	〇	〇
	データセット ID （データセット名）	〇	〇
	ルーティン ID （ルーティン名）	〇	〇
	ルーティン種別	〇	〇
	パラメータ情報	〇	〇
	パラメータの順序	〇	×
	戻り値のデータ型	〇	〇
	UDF 定義言語	〇	〇
	ルーティン定義本文	〇	〇
	決定的関数かどうか	〇	×
	作成日時	〇	〇
	最終更新日時	〇	〇
	description	〇	〇
	外部ライブラリ	〇	〇
	ETag 値	×	〇

※1：メタテーブル（ __TABLES__ ）から取得可能（メタテーブルに対するクエリ課金はなし）

※2：JSON 形式のテーブルスキーマとして取得可能

まとめ（所感）

Python クライアントライブラリでも、INFORMATION_SCHEMA とほぼ同等の情報が取得できました。

現状、クライアントライブラリの方が取得できる項目は少し多いようですが、2020/04 現在 INFORMATION_SCHEMA はまだベータ版なので、 INFORMATION_SCHEMA でも今後さらに取得可能な項目が増える可能性もありそうです。

課金の面からみても現時点ではクライアントライブラリの方に分がある印象を受けましたが、クライアントライブラリでテーブルの詳細情報を取得する場合はテーブル数分の API コールが発生することになるので、処理時間が若干心配です。要件に合わせて適切な I/F を選択する必要がありそうです。

Google クライアントライブラリ for Python で、BigQuery のデータセットやテーブルなどのメタ情報を取得してみた

やりたいこと

目次

前提

データセット情報

list_datasets で取得できる項目を確認

get_dataset で取得できる項目を確認

テーブル / ビュー情報

list_tables で取得できる項目を確認

get_table で取得できる項目を確認

ルーティン情報

list_routines で取得できる項目を確認

get_routine で取得できる項目を確認

アクセス制限されているリソース情報の取得

INFORMATION_SCHEMA から取得できる情報と比較

まとめ（所感）

参考

関連記事

AWSで探す

注目のテーマ

プロダクトやサービスで探す

特集やシリーズから探す

EVENTS